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Abstract 



We study a Bayesian approach to recovering the initial condition for the heat equation 
from noisy observations of the solution at a later time. We consider a class of prior distri- 
butions indexed by a parameter quantifying 'smoothness' and show that the corresponding 
posterior distributions contract around the true parameter at a rate that depends on the 
smoothness of the true initial condition and the smoothness and scale of the prior. Correct 
combinations of these characteristics lead to the optimal minimax rate. One type of priors 
, leads to a rate-adaptive Bayesian procedure. The frequentist coverage of credible sets is 

^SJ " shown to depend on the combination of the prior and true parameter as well, with smoother 

priors leading to zero coverage and rougher priors to (extremely) conservative results. In the 
latter case credible sets are much larger than frequentist confidence sets, in that the ratio 
of diameters diverges to infinity. The results are numerically illustrated by a simulated data 
example. 



1 Introduction 



Suppose a differential equation describes the evolution of some feature of a system (e.g., heat 
conduction), depending on its initial value (at time t = 0). We observe the feature at time T > 0, 
in the presence of noise or measurement errors, and the aim is to recover the initial condition. 
Inverse problems of this type are often ill-posed in the sense that the solution operator of the 
I/-) ' differential equation, which maps the function describing the initial state to the function that 

describes the state at the later time T > at which we observe the system, does typically 
not have a well-behaved, continuous inverse. This means that in many cases some form of 
regularization is necessary to solve the inverse problem and to deal with the noise. 

In this paper we study a Bayesian approach to this problem for the particular example of 
recovering the initial condition for the heat equation. Specifically, we assume we have noisy 
^ | observations of the solution u to the Dirichlet problem for the heat equation 



d d 2 

—u(x,t) = -^u(x,t), u(x,0) = fi(x), u(0,t) = u(l,t) = 0, (1.1) 

where u is defined on [0, 1] x [0,T] and the function fi G L 2 [0, 1] satisfies fj,(0) = fi(l) = 0. The 
solution to (jl.ip is given by 

oo 

u(x,t) = y^2^~] jijeT 1 n t s'm(i-Kx), 
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where (/ij) are the coordinates of [i in the basis = v2sin(i7ra;), for i > 1. In other words, it 
holds that u(-,T) = Kfi, for if the linear operator on L 2 [0, 1] that is diagonalized by the basis 
(ej) and that has corresponding eigenvalues k, = exp(— i 2 7r 2 T), for i > 1. We assume we observe 
the solution if// in white noise of intensity 1/n. By expanding in the basis (ej) this is equivalent 
to observing the sequence of noisy, transformed Fourier coefficients Y = (Yi, Y%, . . .) satisfying 



Yi 



1 



n 



1,2, 



(1.2) 



for (ni) and (k^) as above, and Z\, Z2, ■ ■ ■ independent, standard normal random variables. The 
aim is to recover the coefficients or equivalently, the initial condition fi = Y2^Li ^i e i: under 
the assumption that the signal-to-noise ratio tends to infinity (so n — > 00). 

This heat conductio n inve r se problem has been s t udied in frequentist literat u re (se e , e.g. , 
Bissantz and Holzmann . 2008 :_ Cavalier . 20081 . 2011 : Golubev and Khas'minskii . 19991 : Mair . 
19941 : Mair and Ruvmgaartl . 1996 ) and has als o been addressed in Bayesian framework (with 
additional assumptions on the noise), cf. lStuartJ(j20ld l. For more background on how thi s back - 
w ard heat conductio n problem arises in practical problems, see for instance Beck et all ( 20051 ) 



or 



Engl et al.l (jl99fih . and the references therein. Since the Ki decay in a sub-Gaussian manner 



the estimation of \x is very hard in general. It is well known for instance that the minimax rate 
of estimation for fi in a Sobolev ball of regularity j3 (see Sec. 1.1) relative to the ^ 2 -loss is only 
(logra) - ^/ 2 . This rate is att ained by various method s , including genera l ized Tikhonov reqular- 
ization and spectral cut-off (Bissantz and Holzmann . 20081 : Mairl . 1994 : Mair and Ruvmgaart . 
199fil : iGolubev and Khas'minskii Il999h . 

Convergence rates for Bayesian met hods for problems lik e (|1.2p have only been studied for the 



case t hat decays like a power of i, see lKnapik et al.l (|201ll ). In this paper, like in lKnapik et al 
(|201ll ) , we put product priors of the form 



n 



N(0,X l ) 



(1.3) 



i=l 



on the sequence (fii) and study the corr esponding sequence o f posterior distributions. The results 
we obtain are different from the ones in iKnapik et all (|201lh in a number of ways however. First 
of all, it is in this case not true that to obtain optimal contraction rates for the posterior, we 
need to match the regularities of the true sequence no and the prior exactly. Any degree of 
oversmoothing will do as well. Moreover, if the prior variances A.; are chosen sub-Gaussian, 
then we obtain the optimal rate (logre)"' 3 / 2 for any /3-regular /io, i-e., we obtain a rate-adaptive 
procedure. Unfortunately however, these very smooth prior behave badly from another point 
of view. We show that asymptotically, the freque ntist coverage of cred ible sets based on these 
priors is for a very large class of true /io's. As in lKnapik et al.1 (|201ll ) we see that asymptotic 
coverage 1 is obtained when the prior is less regular than the truth. The radius of a credible set 
is in that case however of a strictly larger order than the ra dius of the correspon ding frequentist 
credible set, which is another difference with the findings in lKnapik et al.l (|201ll ) for polynomial 

These statements are made precise and are refined to include the possibility of rescaling the 
priors in Sec. 2. On a qualitative level, the conclusion of the results must be that in the severely 
ill-posed case that we study in this paper it is advisabl e to use a prio r that is slightly less regular 
than the truth, just as in the mildly ill-posed case of lKnapik et all (|201lh . Unfortunately, the 
corresponding Bayesian credible sets can be very large in the present setting and hence of limited 
use. The results in Sec. 2 all deal with the recovery of the full parameter \x. In Sec. 3 we derive 
the analogous results for the problem of estimating linear functionals of fi. The results are 
numerically illustrated in Sec. 4. Sec. 5 contains proofs of the results presented in Sees. 2 and 
3. Auxiliary lemmas are presented in Sec. 6. 
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1.1 Notation 



For (3 > 0, the Sobolev norm | \/j,\\r and the ^ 2 -norm of an element \x G I 2 are defined by 

oo oo 
i=l t=l 

and the corresponding Sobolev space by S@ = {fj, S I 2 : \\fi>\\p < oo}. 

For two sequences (o n ) and (b n ) of numbers, a n x 6 n means that |a n /6 n | is bounded away 
from zero and infinity as n — > oo, a n < 6 n means that a n /b n is bounded, a n ~ 6 n means that 
o-n/bn — > 1 as n — )• oo, and a n <C fe n means that a n /b n — > as n — > oo. For two real numbers a 
and 6, we denote by a V b their maximum, and by a A 6 their minimum. 



2 Recovering the full parameter 

Under the model (jl.2p and the prior (jl.3p the coordinates (/xo,i,li) of the vector (//o,^) are 
independent, and hence the conditional distribution of fiQ given Y factorizes over the coordinates 
as well. Thus the computation of the posterior distribution reduces to countably many posterior 
computations in conjugate normal models. It is straightforward to verify that the posterior 
distribution n n ( • | Y) is given by 

Our first theorem shows that the posterior contracts as n — > oo to the true parameter at 
a rate e n and quantifies how this rate depends on the behavior of the sequence (Aj) of prior 
variances and the regularity f3 of the true parameter /io- We say the posterior contracts around 
Ho at the rate e n if 

E Mo n n (/i : - /x || < M n e n \ Y) -> 
for every M n — > oo, where the expectation is under the true model governed by the parameter 

Theorem 2.1 Suppose the true parameter belongs to S@ for /3 > 0. 

If Xi = T 2 i~ 1 ~~ 2a for some a > and r n > such that nr 2 — > oo, then the posterior contracts 
around //o at the rate 

e n = {loginr 2 ))-^ 2 + r n {log(nr 2 ))- a/2 . (2.2) 
The rate is uniform over hq in balls in S 13 . In particular: 

(i) Ifr n = l, then e n = (logn)-^ Aa )/ 2 . 

(ii) Ifn~ l / 2+5 < r n < (logn)^ a P)l 2 ^ j or some S > 0, then e n = (logn) /3 ^ 2 . 

If Xi = e~ at for some a > then the posterior contracts around fiQ at the rate 

e n ={logny m . (2.3) 

The rate is uniform over /jq in balls in S@ . 

We think of the parameters (3 and a as the regularity of the true parameter and the prior, 
respectively. The first is validated by the fact that in the heat equation case (e^) is the (sine) 
Fourier basis of L 2 [0, 1]. Therefore (3 quantifies the smoothness of no in Sobolev sense. In case 
of the polynomial decay of the variances of the prior (later referred to as the polynomial prior), 
the parameter a is also closely related to Sobolev regularity. 
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The minimax rate of convergence over a Sobolev ball S@ is of the order (logra) ^ 2 . Now 
consider the case Aj = T 2 i _1_2a . By statement (i) of the theorem the posterior contracts at the 
optimal minimax rate if the regularity of the prior is at least the regularity of the truth (a > f3) 
and the scale T n is fixed. Alternatively, the optimal rate is also attained by appropriately scaling 
a prior of any regularity. Note that if a > /3 scaling is redundant. The theore m shows that 



corre ct' specification of the prior regularity gives the optimal rate. In contrast to lKnapik et al 



(|201ll ) however, the regularity of the prior does not have to match exactly the regularity of the 
truth. Moreover, even though rough priors still need to be scaled to give the optimal rate, there 
is no restriction on the 'roughness'. 

The second assertion of the theorem shows that for very smooth priors (where we take 
A, = e _m ) the contraction rate is always optimal. Since the prior does not depend on the 
unknown regularity /3, the procedure is rate-adaptive in this case. 

Both choices of priors lead to the conclusion that oversmo othing yield s the optimal rate, 



and this has been noted also in t he frequentist literature (see Mair . 19941 ). A fully adaptive 



frequentist method is presented in Bissantz and Holzmannl ( 2008 ). and in both situations the 



optimal performance is caused by the dominating bias. However, in Bayesian inference one often 
takes the spread in the posterior distribution as a quantification of uncertainty. If A^ = e~ ai 
this spread is much smaller than the minimax rate. To understand the implications, we next 
consider the frequentist coverage of credible sets. As the posterior is Gaussian, it is natural to 
center a credible region at the posterior mean. Different shapes of such a set could be considered, 
but the natural counterpart of the preceding theorem is to consider balls. The study of linear 
functionals in the next section makes it possible to consider pointwise credible bands as well. 

A credible ball centered at the posterior mean p,, where fii = nAjKj(l + nAjK?) -1 !^, takes 
the form 

/} + B(r nn ) := {fj, <E £ 2 : \\fj, - /t|| < r„ >7 }, (2.4) 
where B(r) denotes an £ 2 -ball of radius r around and the radius r nri is determined such that 

Il n (fi + B(r nn ) | y) = 1 - 7. (2.5) 

Because the spread of the posterior is not dependent on the data, neither is the radius r n ^. The 
frequentist coverage or confidence of the set (|2.4j) is, by definition, 

P M oOo G ft + BM). ( 2 - 6 ) 

where under the probability measure P w the variable Y follows (|1.2|) with [i = /io- We shall 
consider the coverage as n — > 00 for fixed hq, uniformly in Sobolev balls, and also along sequences 
/ig that change with n. 

The following theorem shows that the relation of the coverage to the credibility level 1 — 7 
is mediated by the regularity of the true /io an d the two parameters controlling the regularity 
of the prior — a and the scaling r n — for both types of priors. For further insight, the credible 
region is also compared to the 'correct' frequentist confidence ball fi + B{r nn ) chosen so that 
the probability in (|2.6p is exactly equal to 1 — 7. 



Theorem 2.2 Suppose the true parameter /j,q belongs to S 13 for (3 > 0. 

If \ = r n^~ 1_2a f or some a > and r n > such that nr 2 — > 00, then asymptotic coverage 
of the credible region (|2.4|) is 



(i) 1, uniformly in hq with \\fJ>o\\p < 1, if T n S> (logn)^ a ■ { n ^ s case r nn /f. 

(ii) 1, uniformly in fiQ with \\no\\p — r for r small enough, if ' r n X (logn)^ a ; 
1, for every fixed € S@ if r n x (logn)^ a 

(Hi) 0, along some //q with sup n ||//Q ||^ < 00, if T n < (log?7,)^ a 



n,7 ' . 
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If Aj = e _m for some a > 0, i/ien i/ie asymptotic coverage of the credible region (|2.4p is 

(raj 0, /or every /io such that |/io,i| ^ e~ c * 2 / 2 /or some c < a. 

// r n = 1, t/ieri i/ie cases (%), fii), and fiii) arise if a < ft, a = {3 and a > f3, respectively. If 
a > j3 in case (Hi) the sequence /j,q can then be chosen fixed. 

The easiest interpretation of the theorem is in the situation without scaling (r n = 1). Then 
oversmoothing the prior (case (iii): polynomial prior with a > f3, and case (iv): exponential 
prior) has disastrous consequences for the coverage of the credible sets, whereas undersmoothing 
(case (i): polynomial prior with a < (3) leads to (very) conservative sets. Choosing a prior of 
correct regularity (case (ii) and (iii): polynomial prior with a = f$) gives mixed results, depending 
on the norm of the true tip. These co nclusions are analogous to the ones that can be drawn from 
Theorem 4.2 in iKnapik et all (1201 lh for the mildly ill-posed case. 



There is one crucial difference, namely the radius of the conservative sets in case (i) are 
not of the correct order of magnitude. It means that th.6 radius r n/ y of the 'correct' frequentist 
confidence ball is of strictly smaller order than the radius of the Bayesian credible ball. 

By Theorem 12.11 the optimal contraction rate is obtained by smooth priors. Combining 
the two theorems leads to the conclusion that polynomial priors that slightly undersmooth the 
truth might be preferable. They attain a nearly optimal rate of contraction and the spread of 
their posterior gives a reasonable sense of uncertainty. Slightly undersmoothing is only possible 
however if an assumption about the regularity of the unknown true function is made. It is an 
important open problem to devise methods that achieve this automatically, without knowledge 
about the true regularity. Exponential priors, although adaptive and rate-optimal, often lead to 
very bad pointwise credible bands. 

3 Recovering linear functionals of the parameter 

In this section we consider the posterior distribution of a linear functional L/i of the parameter. 
In the Bayesian setting we consider measurable linear functionals relative to the prior, covering 
the class of continuous functionals, but also certa i n disc ontinuous functionals (for instance point 
evaluation), following the definition of Skorohod ( 19741 ). Let (4) £ M°° satisfy Yli^i tf^i < °°- 



Then it Cciii be shown that Lfj, : — lini^—^oo 

Y17=i ex i s t s for all fi = (/ij) in a (measurable) 
subspace of £ 2 with N(0, Aj )-probability one. We define L/i = if the limit does not exist. 

The posteri or of the linear funct ional L[i can be obtained from (|2.ip and the definition given 
above (see also Knapik et all 201 ll ) 



1 = 1 1 1 = 1 1 

We measure the smoothness of the functional L by the size of the coefficients as i — > oo. It 
is natural to assume that the sequence (/j) is in the Sobolev space S q for some q, but also more 
controlled behavior will be assumed in following theorems. We say that the marginal posterior 
of Lfj, contracts around L/iq at the rate e n if 

E Mo n n (/i : \Lfi - Lfi \ < M n e n \ Y) -)• 

as n — > oo, for every sequence M n — > oo. 

Theorem 3.1 Suppose the true parameter fiQ belongs to S 13 for (3 > 0. 

If Aj = T^i~ l ~ 2a for some a > and r n > such that nr 2 — > oo, and the representer (/j) of 
the linear functional L is contained in S q , or Li < i -9 " 1 / 2 for some q > —f3, then the marginal 
posterior of Lfi contracts around Lfio at the rate 

e n = (log(nr 2 ))-^/ 2 + r n (log(n^)) (3.2) 
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The rate is uniform over (j,q in balls in S@ . In particular: 

(i) Ifr n = l, then e n = (l ogn )-(/3A(i/2+a) +(/ )/2_ 

(ii) Ifn- l ' 2+& < r n < (logn) {1/2+a - m , for some 5 > 0, then e n = {\ogn)- (P+q)l2 . 



;-> 



If Aj = e a% for some a > then the marginal posterior of Lfi contracts around L[1q at the 
rate 

e n = (\ogny^ )l2 . (3.3) 
The rate is uniform over fiQ in balls in S@ . 

The minimax rate over a ball in the Sobolev space S@ is known to be bounded above by 
(logn) ^ +q ^ 2 ( for the case of q = —1 / 2 see Goldenshluger . 1999 . and for general q in a closely 
related model see Bu tucea and In view of Theorem 12.11 it is not surprising that 

exponential priors yield this optimal rate. In case of polynomial prior this rate is attained 
without scaling if and only if the prior sm oothness a is greater than or equal to (3 minus 1/2. 
Here we observe a similar phenomenon as in lKnapik et al.l (|201ll ). where the 'loss' in smoothness 
by 1/2 is discussed. The regularity of the parameter in the Sobolev scale is not the appropriate 
type of regularity to consider for estimating a linear functional Lfi. If the polynomial prior is 
too rough, then the minimax rate may still be attained by scaling the prior. The upper bound 
on the scaling is the same as in the global case (see Theorem 1 2 . 1 i (ii ) ) after decreasing f3 by 1/2. 
So the 'loss in regularity' persists in the scaling. 

Because the posterior distribution for the linear functional Lfi is the one-dimensional normal 
distribution N(Lfi, s 2 ), where s 2 is the posterior variance in (|3.1|) . the natural credible interval 
for L/i has endpoints L/i± z 7/ / 2 Sn ; for z 7 the (lower) standard normal 7-quantile. The coverage 
of this interval is 

P/i ( L V + z -y/2 s n < L^O < LfJL- Z y / 2 S n ) , 

where Y follows (jl.2p with \i = jjlq. In the following theorem we restrict (Zj) to sequences that 
behave polynomially. 

Theorem 3.2 Suppose the true parameter hq belongs to S 13 for (3 > 0. 

If A« = T n^~ 1 ~ 2a f or some a > and r n > such that nr 2 — > 00, and x i -9 " 1 / 2 , then 
the asymptotic coverage of the interval Lfi ± z^/2 s n is: 

(i) 1, uniformly in fiQ such that \\no\\p < 1 if T n 3> (logn)^ 2+a ^l 2 ^ 

(ii) 1, uniformly in fj,Q with \\no\\p < r for r small enough, if r n X (log n) <yl / 2+a ^>l 2 ■ 

H I- C 7 nH -i- ft \ (1/2 + q-/3)/2 

1, for every fixed fi € S p , if r n x (lognj , 



(Hi) 0, along some /jLq with sup n ||//Q ||^ < 00, if r n < (log 



\ (l/2+a-/3)/2 



If Aj = e ai2 for some a > 0, then the asymptotic coverage of the interval L[i ± z 7/ / 2 s n is: 
(iv) 0, for every fiQ such that fio,ik > e ~ cl2 ^ 2 i~ q ~ 1 ^ 2 for some c < a. 

In case (Hi) the sequence /j,q can be taken a fixed element fiQ in S 13 if r n < f n (logn) _<5 for some 
5 > 0. Furthermore, if r n = 1, then the cases (i), (ii) and (Hi) arise if a < (3 — 1/2, a = (3 — 1/2 
and a > (3 — 1/2, respectively. If a > (3 — 1/2 in case (Hi) the sequence (Jq can then be chosen 
fixed. 
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Similarly as in the problem of full recovery of the parameter /i oversmoothing leads to 
coverage 0, while undersmoothing gives (extremely) conservative intervals. In the case of a 
polynomial prior without scaling the cut-off for under- or oversmoothing is at a = (3 — 1/2, 
while the cut-off for scaling is at the optimal rate f n . Exponential priors are bad even for very 
smooth and the asymptotic coverage in this case is always 0. It should be noted that too 
much undersmoothing is also undesirable, as it leads to very wide credible intervals, and may 
cause that Yli=i tf^i ^ s no l° n g er finite. 



t *4 J_ L | 1 | 1 

In contrast with the analogous theorem in lKnapik et al.1 ( 201ll ). the conservativeness in case 



of undersmoothing is extreme, as the coverage is 1. Since it holds for every linear functional that 
can be considered in this setting, we do not have a Bernstein-von Mises theorem. The linear 
functionals considered in this section are no t smooth enough to cancel the ill-posedness of the 



problem (cf. discussion after Theorem 5.4 in Knapik et al. . 201 ll ) 



4 Simulation example 

To illustrate our results with simulated data we fix a time T = 0.1 and a true function /xq, 
which we expand as fio = Yli^il^o^i in the basis (ej). The simulated data are the noisy and 
transformed coefficients 

Yi = Ki/J,o.i H — l=Zi. 

The (marginal) posterior distribution for the function fj, at a point x is obtained by expanding 
M^) = Yli=i fi>i e i( x )i an d applying the framework of linear functionals Lfi = Yli=i ^i^i with 
li = ei(x) (so li < 1 and q = —1/2). Recall 

We obtained (marginal) posterior pointwise credible bands by computing for every x a central 
95% interval for the normal distribution on the right side of the above display. We considered 
both types of priors. 

Figure 1 illustrates these bands for n = 10 4 and the polynomial prior. In every of 10 panels 
in the figure the black curve represents the function /j,q, defined by 

IM>(x)=4x(x-l)(8x-5), iio,i= 8V ^ (13 t 1 3 1( ~ 1)1) , (4.1) 

where fioi are the coefficients relative to e%, thus [io £ for every (3 < 2.5. The 10 panels 
represent 10 independent realizations of the data, yielding 10 different realizations of the pos- 
terior mean (the red curves) and the posterior pointwise credible bands (the green curves). In 
the left five panels the prior is given by Aj = i _1_2a with a = 1, whereas in the right panels the 
prior corresponds to a = 3. Each of the 10 panels also shows 20 realizations from the posterior 

■ 2 

distribution. This is also valid for Figure 2, with the exponential prior, so Aj = e _m . In the 
left panels a = 1, and in the right panels a = 5. 

A comparison of the left and right panels in Figure 1 shows that the rough polynomial prior 
(a = 1) is aware of the difficulty of inverse problem: it produces wide pointwise credible bands 
that in (almost) all cases contain nearly the whole true curve. Figure 1 together with Figure 2 
show that smooth priors (polynomial with a = 3 and both exponential priors) are overconfident: 
the spread of the posterior distribution poorly reflects the imprecision of estimation. Our theo- 
retical results show that the inaccurate quantification of the estimation error (by the posterior 
spread) remains even as n — > oo. 

The reconstruction, by the posterior mean or any other posterior quantiles, will eventually 
converge to the true curve. The specification of the prior influences the speed of this convergence. 
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Figure 1: Polynomial prior. Realizations of the posterior mean (red) and (marginal) posterior 
credibility bands (green), and 20 draws from the posterior (dashed curves). In all ten panels 
n = 10 4 . Left 5 panels: a = 1; right 5 panels: a = 3. True curve (black) given by (|4.ip . 




Figure 2: Exponential prior. Realizations of the posterior mean (red) and (marginal) posterior 
credibility bands (green), and 20 draws from the posterior (dashed curves). In all ten panels 
n = 10 4 . Left 5 panels: a = 1; right 5 panels: a = 5. True curve (black) given by (|4.1|) . 
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Figure 3: Polynomial prior. Realizations of the posterior mean (red) and (marginal) posterior 
credibility bands (green), and 20 draws from the posterior (dashed curves). Left 5 panels: 
n = 10 4 and a = 0.5, 1, 2, 5, 10 (top to bottom); right 5 panels: n = 10 8 and a = 0.5, 1, 2, 5, 10 
(top to bottom). True curve (black) given by (|4.1|) . 
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Figure 4: Exponential prior. Realizations of the posterior mean (red) and (marginal) posterior 
credibility bands (green), and 20 draws from the posterior (dashed curves). Left 5 panels: 
n = 10 4 and a = 0.5, 1, 2, 5, 10 (top to bottom); right 5 panels: n = 10 8 and a = 0.5, 1, 2, 5, 10 
(top to bottom). True curve (black) given by (|4,ip . 
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This is illustrated in Figures 3 and 4. Every of 10 panels in each of the figures is similarly 
constructed as before, but now with n = 10 4 and n = 10 8 for the five panels on the left and 
right side, respectively, and with a = 1/2,1,2,5,10 for the five panels from top to bottom 
(Aj = i~ x ~ 2a in Figure 3, and \ = e~ m in Figure 4). As discussed above, all exponential priors 
give the optimal rate, but lead to bad pointwise credible bands. Also smooth polynomial priors 
give the optimal rate. This can be seen in Figure 3 for n = 10 8 and a = 2 or 5, where pointwise 
credible bands are very close to the true curve. However, for a = 5 it should be noted that the 
true curve is mostly outside the pointwise credible band. 



5 Proofs 

5.1 Proof of Theorem EH 

Let Si >n and ti >n be such that the posterior distribution in (|2.1|) can be denoted by 
®i=i -^(^/ntj^li, Si :n ) . Because the posterior is Gaussian, it follows that 



„ oo 

/ \\fj, - no\\ 2 dU n (fl | Y) = - ixq\\ 2 + ^2 S i,n, (5-1) 
J 1=1 



where Y follows (|1.2p with fi = fj,Q, and 



By Markov's inequality the left side of (|5.1|) is an upper bound to M„e„II n (/x : — fio\\ > 
M n e n | Y). Therefore, it suffices to show that the expectation under fj,Q of the right side of the 
display is bounded by a multiple of e 2 . The expectation of the first term is the mean square 
error of the posterior mean /2, and can be written as the sum ||E Mo /t — /io|| 2 + Yli=l °f its 
square bias and 'variance'. The second term Yli^i s i,n i s deterministic. If Aj = T 2 i~ l ~ 2a the 
three quantities are given by: 



E moA ~ Mo 2 


=E 

i=\ 


oo 


oo 

= E 


i=i 


i=l 


oo 
i=l 


oo 

= E 

i=l 



oo ^2 

5 (l + nr^-^e- 2 ^ 2 ) 2 (5 ' 2) 



Mp.i _ \ -> Mo, 

+- nA^ 2 ) 2 ~ ^ (1 + nr 2 ;- 1 - 2 

nX 2 K 2 ^ „ r 4 r 2-4a„- 



, (l + nr^-^e- 2 - 2 " 2 ) 2 1 ' 7 



t (1 + nA^ 2 ) 2 ^(l + nr 2 z 

\ . °° -2,— l-2a 

-2 = V — (5 4) 

- nA^ 2 f-' 1 + nr^-i-^e" 2 - 2 " 2 ' V ' ; 

By Lemma [6. II (applied with q = /3, t = 0, r = 0, u = 1 + 2q, p = 2ir 2 T, v = 2, and N = nr 2 ) 
the first term can be bounded by log(nr 2 ) _/3 , which accounts for the first term in the definition 
of e n in (J22]). By Lemma Q ( applied with t = 2 + 4a, r = 2vr 2 T, it = 1 + 2a, p = 2vr 2 T, v = 2, 

and iV = n-r 2 ) the second expression is of the order r 2 (log nr 2 ) ^ 2 °. The third expression is 
of the order the square of the second term in the definition of e n in (|2.2p . by Lemma l6.2l (applied 
with t = 1 + 2a, r = 0, u = 1 + 2a, p = 27r 2 T, u = 1, and iV = nr 2 ). 

The consequences (i)-(ii) follow by verification after substitution of r n as given. 

In case of Aj = e~ at , we replace i~ 1 ~ 2a by e~ m and set r n = 1 in (|5.2|) - (|5.4|) . We then 
apply Lemma 16.11 (with q = f3, t = 0, r = 0, u = 0, p = 2-7r 2 T + a, u = 2, and N = n) and 
see that the first term can be bounded by (logn) , which accounts for the first term in the 
definition of e n in (|2.3p . By Lemma 16.21 (applied with t = 0, r = 2a + 2ir 2 T, u = 0, p = 2ir 2 a, 
v = 2, and N = n), and again Lemma 16.21 (applied with i = 0, r = a, u = 0, p = a + 2it 2 T, 
v = 1, and N = n) the latter two are of the order n - a /{ a + 2TC r ). 
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5.2 Proof of Theorem Q 

Because the posterior distribution is (£)'*L 1 N^nt^^Yi, s^ n ), by (|2.1|) . the radius r„ j7 in (|2.5p 
satisfies ~P(U n < ^,7) = 1 — 7) f° r U n a random variable distributed as the square norm of an 
-^(0) Sj,n)-variable. Under (|1.2p the variable /t is Af((E w /i)j, ij in J -distributed, and 

thus the coverage (|2.6p can be written as 

P(||W n + E jU0 /}-/i || < r n , 7 ), (5.5) 

for W n possessing a (2)°^ iV(0, tj jn )-distribution. For ease of notation let V n = ||W n || 2 . 

The variables U n and V n can be represented as U n = Y^=i s i,nZf and V n = Yli^iU^Zf, 
for Zi, Z2, ■ ■ ■ independent standard normal variables, and Si yTl and ij jn are as in the proof of 
Theorem 12.11 By Lemma 16.21 (cf. previous subsection) 



EUn = ^2 Si ' n ~ r ™ ( log nr «) a sd Un 

i=l 
00 

EV n = U,n ~ ^ (log W$ ~ V2 ~ a sd V n 



y 2E4n-^(log-„ 2 )- 1/4 - a 
\ i=l 



i=l 

It follows that 
and therefore 



\ i=i 



r 2 ^ x r 2 (log nr^) "xE[/ n » EV n x sd K, 

9 v (V n — EV n 6r* y -EV n \ , 
P(K.<< 7 )=p(^ n ^<^ n ^)^l, (5.6) 

for every 5 > 0. The square norm of the bias E Mo /i — /io is given in (|5.2|) . where it was noted 
that 

B n := sup HE^/i - /x || x (lognr 2 ) _/3//2 . 
Il«)||/3<i 

The bias B n is decreasing in r re , whereas EC/ n is increasing. The scaling rate f n x (logn)^ a ^ 2 
balances the square bias B\ with the posterior spread E?7 n , and hence with r 2 _,. 

Case (i). In this case B n <C r„ j7 . Hence P(||W n + E^/i — /io|| < fn,j) > P(||Wn.|| < 
r nn — B n ) = P(y n < t^ 7 (1 + o(l))) — > I, uniformly in the set of fio in the supremum defining 
B n . Note that f„ i7 is such that the coverage in ()5.5[) is exactly 1 — 7. Since ||Wn|| 2 = V^, we 

have that r\ „ t is of the order B\ + T^(lognr 2 ) 1//2 °, so of strictly smaller order than r^ 7 , and 
therefore r nj7 /f n>7 — > 00. 

Case (ii). In this case B n x r nj7 . By the second assertion of Lemma l6.2l the bias ||Eu /i — fio\\ 
at a fixed ^0 is of strictly smaller order than the supremum B n . The argument of (i) shows that 
the asymptotic coverage then tends to 1. The maximal bias B n (r) over ||/no||/3 < r is of the order 
r ni7 and proportional to the radius r. Thus for small enough r we have that r n ^ — B n {r) > 
r n ' 7 -> 00. Then P(||W n + E W A " Moll < r n , 7 ) > P(||W„|| < r n , 7 - B„(r))> P(K < r 2 ^) -> 1. 

Case (iii). In this case B n > r n>7 . Hence any sequence /Xg that (nearly) attains the maximal 
bias over a sufficiently large ball ||/io||/3 — r such that B n (r) — r nj7 > r ni7 satisfies P(||W n + 
E M0 A - Moll < r n , 7 ) < P(||W„|| > B„(r) - r n , 7 ) < P(F n > r^) -> 0. 

If r n = 1, then i? n and both powers of 1/logn and hence -B n 3> r n7 implies that 

B n ^ ?"n,7 (log n) S , for some 5 > 0. The preceding argument then applies for a fixed /^o of the 
form /iQ,i X i~ 1 / 2 ~ /3 ~ e , for small e > 0, that gives a bias that is much closer than (logn) 5 to B n . 

Case (iv). In the proof of Theorem 12.11 we obtained ~EU n >c EV n x n -a/("+27r 2 T)_ j^. can 

Shown that sdU n X n -Q/(a+2vr 2 T)^ go algo f 2 ^ n -a/(Q+27r 2 T)_ j£ .| > e ~ci 2 /2 £ Qr some c < aj 
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we have 



oo 2 oo _ ci 2 
2 _ V- M0,i > \T- 6 _ -c/(a+2^T) ^ „ -a/(a+ 2 7r 2 T) 



4=1 v 1 ' 1=1 v ' 

by Lemma 16.21 (applied with t = 0, r = c, u = 0, p = a + 2tt 2 T, v = 2, and N = n). Hence 
P(||W„ + E w /t — Mo|| < r n , 7 ) < P(K > HE^A - Moll 2 - < 7 ) -> 0. 

5.3 Proof of Theorem I5H 

By ()3,ip the posterior distribution is N(Lfi, s 2 ), and hence similarly as in the proof of Theo- 
rem [2J] it suffices to show that 

is bounded above by a multiple of e 2 . If A, = t 2 z _1_2q the three quantities are given by 



E« Lm — £^0 



oo 



4^1 + n\iK? 
i=i 1 



oo 



< \kvo,i\ /(. 7 n 

- ! + nr 2 i -l-2a e -2^ T i2 ^ ^ 

i=l 11 

^ * 2 reA 2 K 2 Iji-^ e-^ 
^(l + n\ t K 2 ) 2 »^(i +n7 ^- 



^(l+nA iK 2 ) 2 (l + jirB- 1 " 2 ^- 2 - 2 " 2 ) 2 1 j 



00 ;2\ 00 ;2,--l-2a 



f-f 1 + nA^K 2 

i=i ' «=i ™ 



By the Cauchy-Schwarz inequality the square of the bias (|5.7p satisfies 

|e m L//- l^oI 2 < IImII 2 2^ (1 + nr 2._\_ 2Qe _ 27r 2 Ti 2 )2 - ( 5 - 10 ) 

Consider (/j) £ 5 9 . By Lemma I6TT1 (applied with q = q, t = 2(3, r = 0, ii = 1 + 2a, » = 2n 2 T, 
v = 2, and iV = rer 2 ) the right side of this display can be further bounded by ||/Uo|| 2 ||Z|| 2 times 
the square of the first term in the sum of two terms that defines e n . By Lemma 16.11 (applied 
with q = q, t = 2 + 4a, r = 2ir 2 T, u = 1 + 2a, p = 2it 2 T, v = 2, and N = nr 2 ), and again by 
Lemma RTTl (applied with q = q, t = 1 + 2a, r = 0, u = 1 + 2a, p = 2tt 2 T, v = 1, and iV = nr 2 ) 
the right sides of (|5.8p and (|5.9p are bounded above by ||£|| 2 times the square of the second term 
in the definition of e n . 

Consider li < i~ q ~ 1 / 2 . This follows the same lines as in the case of (li) € S q , except that we 
use Lemma 16.21 instead of Lemma 16. 11 In this case the upper bound for the standard deviation 
of the posterior mean t n is of the order T n (log rer 2 ) ^ 1+a+q ^ 2 _ 

Consequences (i)-(ii) follow by substitution. 

If Aj = e _m , then in case (li) E S q we use Lemma HOI (with q = q, t = 2/3, r = 0, u = 0, 
p = a + 2tt 2 T, v = 2, and N = n), and Lemma 16.21 (with q = q, t = 0, r = 2a + 2ir 2 T, u = 0, 
p = a + 2ir 2 T, v = 2, and N = n), and again Lemma 16.21 (with q = g, £ = 0, r = a, u = 0, 

p = a + 2vr 2 T, v = 2, and N = n) to bound (IfTTO]) by a multiple of (log n)~^ +q) , and (|^8l) - (|Q]l 
by a multiple of n -°/("+2T 2 T) ( log n ) ~ q _ 

If 'i ^ ^ 9 ~ 1/2 , we use Lemma O (with i = 1 + 2g + 2/3, r = 0, u = 0, p = a + 2vr 2 T, 
v = 2, and TV = n), and Lemma IO (with t = 1 + 2q, r = 2a + 2vr 2 T, u = 0, p = a + 27r 2 T, 
f = 2, and iV = n), and again Lemma 16. II (with t = 1 + 2q, r = a, n = 0, p = a + 2tt 2 T, v = 1, 
and = n) to bound (|5.10p by a multiple of (log re) ^ +q \ an d (|5.8p - (|5.9p by a multiple of 
n- Q /(«+ 27r2T )(logre)" 1/2 ~ 9 . 
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5.4 Proof of Theorem I5T21 

Under (|1.2|) the variable L\x is A^E^L/i, i 2 )-distributed, for £ 2 given in (|5.8|) . It follows that 
the coverage can be written, with W a standard normal variable, 

P(\Wt n + E^Lii-Lfi \ < -s n z j/2 ). (5.11) 

The bias [E^Lfx — Lhq\ and posterior spread s 2 are expressed as series in (|5.7|) and (|5.9p . 

Because W is centered, the coverage (|5.1ip is largest if the bias E^ L/j, — L/iq is zero. It is 
then at least 1 — 7, because t n < s n , and tends to exactly 1, because t n <S s n . 

The supremum of the bias satisfies 

B n := sup \E m r^-Lno\^{log(nT^y W+q)/2 . (5.12) 
IImoII^<i 

The maximal bias B n is a decreasing function of the scaling parameter r n , while the root 
spread s n increases with r n . The scaling rate f n = (log n) ^^ 2+a ^/ 2 i n the statement of the 
theorem balances B n with s n . 

Case (i). If r n 3> f n , then B n <C s n . Hence the bias \E^ Q Ln — L/j,q\ in (|5.1ip is negligible 
relative to s n , uniformly in ||//q 11/3 ^ 1> an d P(|W*n + E W ^M — -£4*0 1 < — s n-z 7 /2) > P(|Wi n | < 
-s n z 7 / 2 - |E M0 Lpt - Lfio\) -> !■ 

Case (ii). If r n x f n , then i? n x s n . If b n = \E^Lfi — L/j,q\ is the bias at a sequence 

that nearly assumes the supremum in the definition of B n , we have that P(|Wi n + db n \ < 
— s n z^/2) > P(|Wi n | < s n |z 7/ / 2 | — <i6 n ) 1 if d is chosen sufficiently small. This is the coverage 
at the sequence dfi^, which is bounded in S@ . On the other hand, using Lemma 16.31 it can be 
seen that the bias at a fixed /xq £ S@ is of strictly smaller order than the supremum B n , and 
hence the coverage at a fixed /xo is as in case (i). 

Case (iii). If r n < f n , then B n > s n . If 6 n = |E M n L/x — Lfj,^] is again the bias at a sequence 
/Uq that (nearly) attains the supremum in the definition of B n , we we have that P ( | W t n + db n \ < 
— s n z 7 / 2 ) < P(|Wi n | > db n — s n \z^i2\) — > if d is chosen sufficiently large. This is the coverage 
at the sequence d/ifi, which is bounded in S 13 . By the same argument the coverage also tends 
to zero for a fixed in S@ with bias b n = |E^ L/i — L/xo| S> s n 3> i n . For this we choose 
Mo,i = z - ' 3 " 1 / 2 "' 5 for some (5' > 0. By another application of Lemma 16.21 the bias at /xo is of the 
order 

00 , 00 ._ a_ a -S'-l 

V ^ X V % — x flog(nr 2 ))- {/3+9+<5 ' )/2 

i=i 11 i=i " 

Therefore if r ra < f n (logn) 5 for some 6 > 0, then f? n > s n (log nr 2 ) 5 for some 5" > 0, and 
hence taking 5' = 5" we have b n x i? n (log(nr 2 )) 5 ^ 2 S> s n S> in- 

Case (iv). In the proof of Theorem 13. 1| we obtained s n x i n x n _a ^ a+27r2T ^ (log n) 9 . If 
A*o,i^i ^ e - c * 2 /2^-g-i/2 £ or some c < a, we have 



|E^ L/i — L/xq| 



e -ci 2 ^-2g-l 



1 + n Ai/e? ~ frf (1 + ne-( Q + 2 ^ 2T ) 

1=1 1 2=1 v 



by Lemma 16.21 (applied with t = 1 -+- 2(7, r = c, u = 0, p = a + 2tt 2 T, v = 2, and N = n). Hence 
P(|W^ n + E w I/i - L/x | < -s„z t/2 ) < P(|Wt„| > |E Mo L/i - L/i | - Sn^ 7 / 2 ) - > 0. 

If the scaling rate is fixed to r n = 1, then it can be checked from (|5.12p and the proof of 
Theorem 13.11 that B n <g s n , B n x s n and B n 3> s n in the three cases a < /3 — 1/2, a = /3 — 1/2 
and a > /3 — 1/2, respectively. In the first and third cases the maximal bias and the root spread 
differ by more than a logarithmic term (logn)* 5 . It follows that the preceding analysis (i), (ii), 
(iii) extends to this situation. 
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6 Appendix 

Lemma 6.1 For any q, u, v > 0, t > —2q, p > 0, and < r < vp, as N — > oo, 

ft-** 



i2--t„~ri' 2 



sup y kl e ~ N ~r/ P (1 N) -m- q+ nr/(2 P ) 

<\\ q <it[( i+Ni ~ ue ~ p y 



Moreover, for every fixed ^ £ 5 ? , as N — >■ oo, 



2 „— t„-ri 



N r /P(loKN) t/2+q - ur/(2p) V ^ 6 

v ; frf (1 + Ni~ u e 



Proof Let In be the solution to Ni "e p% = 1. In the range i < In we have iVi "e pi < 
1 + Ni~ u e~ pi2 < 2Ni- u e- pi \ while 1 < 1 + Ni^e'^ < 2 in the range t > /at. Thus 

c2-— 1„— ri 2 ■uv—t—2q„(vp—r)i 2 



since for iV large enough all terms i uv - t - 2 i e (v P -r)i j n ran g e w j u De dominated by 
juv t 2q^ vp - r )i N an( ^ go ^ veg e q Ua tion Ni~ u e~ pi = 1. Similarly for the second range, we 
have 



(l + iVi-«e-^) 



V — X y gi^r^e-" 2 < N -r/ Pl -t-2 q +ur/ P y ,2-2 ? 



Lemma 16.41 yields the upper bound for the supremum. 

The lower bound follows by considering the sequence given by £j = i~ q for i ~ In and 
£i = otherwise, showing that the supremum is bigger than iV~ r / p (log iV) *^ 2 'J^ -1 ""/^) ^ 

The preceding display shows that the sum over the terms i > In is 

(AT-'/P (log AT) -*/2"9+«r/(2p)^ Furthermore 

, ,, . £2-— 1_— ri 2 ■uv—t—2q„(vp—r)i 2 

and this tends to zero by dominated convergence. Indeed, as noted before, for A' large enough 
all terms i™-*- 2< ? e ( w P- r )* 2 \ n the range i < In are upper bounded by j"^~*~ 2 9 e (up— r)!^ _ 

N v-r/ PI -t-2 q +ur/p^ ^ by Lenuna [^ ^-r/pj-t^+WP _ ^_ r/p ^ ^ - t /2- 9 +«r-/(2 P ) ^ ^ 

since v — r /p > 0. □ 



Lemma 6.2 For any t, u, t> > 0, j> > 0, and < r < vp, as N — )• oo, 



'iV^(logiV)~ t/2+ur/(2p) */r^0, 



^ (l + ATi-ue-^)" L(logiV)~ (<+1)/2 ifr = 0. 

Proof As in the preceding proof we split the infinite series in the sum over the terms i < In 
and i > In- For the first part of the sum we get 

j-t^-ri 2 juv-t e (vp-r)i 2 



E L e \ ^ 

(1 + Ni~ u e- pi2 ) v ~ ^ N v 
i<In i<In 

Most certainly N v ■ I^e'^N = i N uv-t e (vp-r)l N 2 < ■nt,-t e (i,p-r) J 2 _ Tf pv-t e (v P -r)i 2 



as 



a function of z is strictly increasing, then the sum is upper bounded by the integral in the same 
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range, and the value at the right end-point. Otherwise i uv t e^ vp r ^ 2 first decreases, and then 
increases, and therefore the sum is upper bounded by the integral, and values at both endpoints: 

/In 
x uv-t e (vp-r)x 2 ^ x _|_ e vp-r _|_ J^v-t e (vp-r)I N 2 



i<I N 



——^ I N ^~t-l e (vp-r)I N 2 ^ + + & vp-r + lN uv-t e (vp-r)I N 2 

2(vp — r) ' 

j N UV-t e {vp-r)I N 2 ^ + o(1) ^ 



by Lemma 16.51 Therefore by Lemma 

6 — ~ I^e-^ = N-r/p^t+ur/p „ N - r/p ( logN yt/2+ur/(2p)_ 

i<I N 

The other part of the sum satisfies 

(1 + Ni- U e-P i2 ) v 

Suppose r > 0. Again, the latter sum is lower bounded by I^*e~ r7 ^ x iV~ r / p (log N) t / 2+ur /( 2 P\ 
Since z -f e - " 2 is decreasing, we get the following upper bound 

poo -1 

Y, ^*e~" 2 < lNe- rI ^ + / x- l e- rx2 dx < I^e'^ + ^I^e'^ 
^r__ JIn 



i>I N 



I^e- rI ^(l + o(l)) x N~ r / p (logN) 



-t/2+ur/(2p) 



where the upper bound for the integral follows from Lemma 16.51 

In case r = 0, we get ^2 i>lN i~ l x I~n +1 x (logiV) (* +1 " 2 (see Lemma 8.2 in Knapik et al. 

201 lh . □ 



Lemma 6.3 For any t > 0, u,p > 0, fi £ 5'/ 2 , and q > —t/2, as N — > oo 



El 

i=l 



-5-1/21 



+ Ni~ u e-P^ 



<C (logiV) 



-t/2-5 



Proof We split the series in two parts, and bound the denominator 1 + Ni u e pi by Ni "e p * 
or 1. By the Cauchy-Schwarz inequality, for any r > 0, 



E 



AM 



-g-l/21 



Ni -u e -pi^ 



< 



1 ^ ,,2 ? -2«-2g-r 2pi 2 

i<In i<In 



< — 2^ ^ Z 



2u-2q-r-t 2pi 2 



N 2 ' 



i<I N 



L N 



-t-2q 



i<I N 



T 2u-2q-r-t 2 pl 2 N 
N e 

j2u-2q-r-t e 2pf 
r2u-2q-T-t 2 pl% ■ 



T 2u-2q-r-t 2pl 



The terms in the remaining series in the right side are bounded by a constant times /i 2 i* for 
large enough N and all i bigger than a fixed number, and tend to zero pointwise as iV — > oo, 
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and the sum tends to zero by the dominated convergence theorem. Therefore the first part of 
the sum in the assertion is o(I N 2q ^ t ). As for the other part we have 



E 

i>I N 



■g-1/2 1 



* E ^ E * a * ^ 2q E 



which completes the proof as /i G S**/ 2 , and 7^' 29 X (log iV") q by Lemma 16.41 □ 



Lemma 6.4 Let In be the solution for 1 = Ni u e v% , for u > and p > 0. TTien 



/at ~ J- log AT. 
V P 

Proof If u = the assertion is obvious. Consider u > 0. The Lambert function W satisfies the 

■2 

following identity z = W(z) exp W(z). The equation 1 = Ni~ u e~ pi can be rewritten as 

2 P AT 2/u /2p. 2 \2p. 2 
— iv ' = exp — i — i 
u V u J u 



and therefore by definition of W(z) 



By Corless et al. ( 19961 ) W(x) ~ log(x), which completes the proof. □ 



Lemma 6.5 1. For 7 G M, ( > we have, as K — » 00, 



1 2C 



2. For if > 0, 7 > 0, ( > we have 



K 



1 

2?" 



Proof First integrating by substitution y = x 2 and then by parts proves the lemma, with the 
help of the dominated convergence theorem in case 1. □ 
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